R version 3.6
libraries: stringr, ape, ggplot2, ggtree
Note that the program comprises two different procedures in general: the first is simulation and the second is the analysis of the simulation results. This User-Guide pertains to the analysis alone. The programs for the analysis can be run only after the simulation is completed, and the cellout.txt file exists in the tugHall/Output/ folder.
To perform the simulation, kindly see the User-Guide-tugHall file. After the simulation the file tugHall/Output/cellout.txt is generated, which is used to analyze the evolution of cells. Also, since the functions and objects are used after the simulation, the .RData file saved after the simulation must be loaded, if required.
The simplest way to analyze the results after simulation:
source("Code/Analysis.R").The code has initial input parameters and input files in the /Input/ folder to define the names of the genes. In the dialogue box, the user can see the results of the simulation, which will be saved to the /Output/ and /Figures/ folders.
To analyze the output data, the user has to obtain the results of the simulation in the tugHall/Output/cellout.txt file and the functions and objects of simulations should be present in the R environment. That is why the cellout.txt file is the input file for the analysis. For detailed information, kindly see the “Outputs” section in User-Guide-tugHall.
Output data contain several files and figures:
tugHall/Output/Order_of_dysfunction.txt has information about the order of gene dysfunction during evolution in the next format (only first 10 lines are presented here):
| Order of gene dysfunction | Number of cells |
|---|---|
| APC -> TP53 -> PIK3CA -> KRAS | 41006 |
| 1720 | |
| PIK3CA -> TP53 -> KRAS -> APC | 396 |
| APC -> TP53 -> PIK3CA | 20 |
| PIK3CA -> TP53 -> KRAS | 20 |
| APC | 13 |
| PIK3CA | 10 |
| TP53 -> PIK3CA -> APC | 4 |
| TP53 | 2 |
| KRAS | 1 |
tugHall/Output/VAF.txt file has information about the VAFs for each gene and each site in the genes (first 10 lines):
| DriverPasngr | Gene | Position | VAF_Primary | Ncells_Primary_wMutation | Ncells_Primary | VAF_Metastatic | Ncells_Metastatic_wMutation | Ncells_Metastatic | VAF_PriMet | Ncells_PriMet_wMutation | Ncells_PriMet |
|---|---|---|---|---|---|---|---|---|---|---|---|
| D | APC | 3 | 0 | 0 | 1793 | 0.0000845 | 7 | 41400 | 0.0000810 | 7 | 43193 |
| D | APC | 7 | 0 | 0 | 1793 | 0.0001449 | 12 | 41400 | 0.0001389 | 12 | 43193 |
| D | APC | 8 | 0 | 0 | 1793 | 0.0000242 | 2 | 41400 | 0.0000232 | 2 | 43193 |
| D | APC | 9 | 0 | 0 | 1793 | 0.0000242 | 2 | 41400 | 0.0000232 | 2 | 43193 |
| D | APC | 10 | 0 | 0 | 1793 | 0.0000121 | 1 | 41400 | 0.0000116 | 1 | 43193 |
| D | APC | 15 | 0 | 0 | 1793 | 0.0000121 | 1 | 41400 | 0.0000116 | 1 | 43193 |
| D | APC | 19 | 0 | 0 | 1793 | 0.0000121 | 1 | 41400 | 0.0000116 | 1 | 43193 |
| D | APC | 20 | 0 | 0 | 1793 | 0.0021981 | 182 | 41400 | 0.0021068 | 182 | 43193 |
| D | APC | 22 | 0 | 0 | 1793 | 0.0001691 | 14 | 41400 | 0.0001621 | 14 | 43193 |
| D | APC | 23 | 0 | 0 | 1793 | 0.0000121 | 1 | 41400 | 0.0000116 | 1 | 43193 |
DriverPasngr - D or P indicate the distinction between the Driver and Passenger genes.
Gene - name of gene, e.g. TP53, KRAS.
Position - position at mutated site in the gene, e.g. 123, 1028.
VAF_Primary - VAF for cells in the primary tumor = half of Ncells_Primary_wMutation / Ncells_Primary, e.g. 0.2.
Ncells_Primary_wMutation - number of primary-tumor cells, e.g. , 40.
Ncells_Primary - number of primary-tumor cells, e.g. 100.
VAF_Metastatic VAF for metastatic cells = half of Ncells_Metastatic_wMutation / Ncells_Metastatic, e.g. 0.35.
Ncells_Metastatic_wMutation = number of metastatic cells with the mutation, e.g. 70.
Ncells_Metastatic = number of metastatic cells, e.g. 100.
VAF_PriMet VAF for all cells = half of Ncells_PriMet_wMutation / Ncells_PriMet, e.g. 0.275
Ncells_PriMet_wMutation number of all mutated cells, e.g. 110.
Ncells_PriMet number of all cells, e.g. 200.
The directory Figures/ contains many output figures, generated during the analysis process of cellout.txt file, including the evolution of the number of primary tumors and metastasis cells (Fig.1 left), hallmarks (Fig.1 right), and probabilities (Fig.2 left).
| Files | Description |
|---|---|
| Barplot_N_cells_in_clones_DP.jpg | Barplot for number of cells in each clone for passengers and drivers |
| Barplot_N_cells_in_clones.jpg | Barplot for number of cells in each clone for drivers only |
| Hallmarks.jpg | Time evolution of the average hallmark values |
| Inequality_all_cells.jpg | Dependence of the inequality coefficient on number of all cells |
| Inequality_metastasis.jpg | Dependence of the inequality coefficient on number of metastasis cells |
| Inequality_primary.jpg | Dependence of the inequality coefficient on number of primary tumor cells |
| Inequality.jpg | Time evolution of the inequality coefficient |
| N_cells_in_clones_1.jpg | Time evolution of number of cells in each clone |
| N_cells_in_clones_2.jpg | Time evolution of number of cells in each clone using different scale |
| N_cells.jpg | Time evolution of the number of cells |
| N_clones.jpg | Time evolution of the number of clones |
| Probabilities.jpg | Time evolution of probabilities |
| Tree_clones.jpg | Tree of the clones |
| ggtree_clones.jpg | Tree of the clones using ggtree package |
Fig.1. Results of the simulation: left - evolution of number of cells, right - evolution of hallmarks. Files are tugHall/Figures/N_cells.jpg and tugHall/Figures/Hallmarks.jpg.
Fig.2. Results of the simulation: left - evolution of probabilities, right - evolution of number of clones. Files are tugHall/Figures/Probabilities.jpg and tugHall/Figures/N_clones.jpg.
The right side Fig.2 shows the evolution of the number of clones. Here, we have to define the clone in the simulation, which is a pool of cells with a same set of mutated genes either for the driver alone or for all mutations (driver and passenger genes together). For this propose, we define the clone ID as the binary number of mutated genes, please, see cellout.txt file for columns “Clone.number”, “Passengers.Clone.number” and “Mix.Clone.number”. Also, the analysis can calculate the evolution of number of cells in each clone:
Fig.3. Results of the simulation: left - evolution of number of cells in clones, right - evolution of number of cells in clones, magnified. Files are tugHall/Figures/N_cells_in_clones_1.jpg and tugHall/Figures/N_cells_in_clones_2.jpg.
Fig.4 shows the number of cells in each clone at the last time step to see which clone is dominant and prevails above the others:
Fig.4. Results of the simulation: left - barplot for number of cells in clones for driver mutated clone, right - same plot for all mutated cells. Files are tugHall/Figures/Barplot_N_cells_in_clones.jpg and tugHall/Figures/Barplot_N_cells_in_clones_DP.jpg.
During the simulation each cell has “clone ID”, which is calculated from the binary code of the mutated genes. For example, if a gene is mutated, then its value in binary code is designated as 1, and if not, it is set to 0. For example, the cells have only 4 genes in simulation, and hence, the “clone ID” can have binary numbers from 0000 to 1111, which are the associated decimal numbers from 0 to 15. Each cell has information about the parent ID, and so, time of birthday, so it is possible to calculate cells having the same “clone ID”, Subsequently, the cell with the earliest birthday is identified and its parent ID extracted. This parent ID can be used to find the “clone ID” for the parent, which is related to the parent of clone. Using this procedure, we can find all the relations between the parent and children for the clones and construct the tree for the clones (cells) at last time step (Fig.5).
Fig.5. Tree of clones at the last time step. The numbers of the tree indicate the clone IDs (upper is the plot using the plotTreeTime function from the ape package, and lower is the plot using the ggtree package). The files are tugHall/Figures/Tree_clones.jpg and tugHall/Figures/ggtree_clones.jpg.
Figs.6 and 7 show the inequality coefficients (for driver mutated cells and for any type mutation) as a function of:
time step;
number of all cells;
number of primary tumor cells;
number of metastasis cells.
Fig.6. Results of simulation of inequality coefficient for driver mutated cells and for cells with any type of mutations: left - evolution of inequality coefficient, right - inequality coefficient as a function of all cells. Files are tugHall/Figures/Inequality.jpg and tugHall/Figures/Inequality_all_cells.jpg.
Fig.7. Results of simulation of inequality coefficient for driver mutated cells and for cells with any type of mutations: left - inequality coefficient as the function of primary tumor cells, right - inequality coefficient as the function of metastasis cells. Files are tugHall/Figures/Inequality_primary.jpg and tugHall/Figures/Inequality_metastasis.jpg.